11 research outputs found

    Free-rider Attacks on Model Aggregation in Federated Learning

    Get PDF
    Free-rider attacks against federated learning consist in dissimulating participation to the federated learning process with the goal of obtaining the final aggregated model without actually contributing with any data. This kind of attacks is critical in sensitive applications of federated learning, where data is scarce and the model has high commercial value. We introduce here the first theoretical and experimental analysis of free-rider attacks on federated learning schemes based on iterative parameters aggregation, such as FedAvg or FedProx, and provide formal guarantees for these attacks to converge to the aggregated models of the fair participants. We first show that a straightforward implementation of this attack can be simply achieved by not updating the local parameters during the iterative federated optimization. As this attack can be detected by adopting simple countermeasures at the server level, we subsequently study more complex disguising schemes based on stochastic updates of the free-rider parameters. We demonstrate the proposed strategies on a number of experimental scenarios, in both iid and non-iid settings. We conclude by providing recommendations to avoid free-rider attacks in real world applications of federated learning, especially in sensitive domains where security of data and models is critical

    Sequential Informed Federated Unlearning: Efficient and Provable Client Unlearning in Federated Optimization

    Full text link
    The aim of Machine Unlearning (MU) is to provide theoretical guarantees on the removal of the contribution of a given data point from a training procedure. Federated Unlearning (FU) consists in extending MU to unlearn a given client's contribution from a federated training routine. Current FU approaches are generally not scalable, and do not come with sound theoretical quantification of the effectiveness of unlearning. In this work we present Informed Federated Unlearning (IFU), a novel efficient and quantifiable FU approach. Upon unlearning request from a given client, IFU identifies the optimal FL iteration from which FL has to be reinitialized, with unlearning guarantees obtained through a randomized perturbation mechanism. The theory of IFU is also extended to account for sequential unlearning requests. Experimental results on different tasks and dataset show that IFU leads to more efficient unlearning procedures as compared to basic re-training and state-of-the-art FU approaches

    Free-rider Attacks on Model Agregation in Federated Learning

    Get PDF
    Accepted at AISTATS 2021International audienceFree-rider attacks against federated learning consist in dissimulating participation to the federated learning process with the goal of obtaining the final aggregated model without actually contributing with any data. This kind of attacks is critical in sensitive applications of federated learning, where data is scarce and the model has high commercial value. We introduce here the first theoretical and experimental analysis of free-rider attacks on federated learning schemes based on iterative parameters aggregation, such as FedAvg or Fed-Prox, and provide formal guarantees for these attacks to converge to the aggregated models of the fair participants. We first show that a straightforward implementation of this attack can be simply achieved by not updating the local parameters during the iterative federated optimization. As this attack can be detected by adopting simple countermeasures at the server level, we subsequently study more complex disguising schemes based on stochastic updates of the free-rider parameters. We demonstrate the proposed strategies on a number of experimental scenarios, in both iid and non-iid settings. We conclude by providing recommendations to avoid free-rider attacks in real world applications of federated learning, especially in sensitive domains where security of data and models is critical

    Reliability and robustness of federated learning in practical applications

    No full text
    L'apprentissage fédéré a gagné en popularité ces dernières années car il permet à différents clients d'apprendre conjointement un modèle global sans partager leurs données respectives. FL se spécialise dans le problème classique de l'apprentissage distribué, pour tenir compte de la nature privée des informations des clients et de l'hétérogénéité potentielle des données et du matériel entre les clients, qui est généralement inconnue du serveur. Dans ce contexte, l'objectif principal de cette thèse est de présenter de nouveaux résultats théoriques et pratiques pour quantifier l'impact de l'hétérogénéité des données clients sur les garanties de convergence de l'apprentissage fédéré, tout en étudiant la faisabilité de composants critiques pour le déploiement de l'apprentissage fédéré dans des applications concrètes. Dans la première partie de la thèse, nous étudions la robustesse et la variabilité de l'apprentissage fédéré aux données hétérogènes. À cette fin, nous introduisons la notion de coefficients stochastiques d'agrégation pour généraliser le schéma d'agrégation proposé dans FedAvg, ainsi qu'une nouvelle théorie pour tenir compte asymptotiquement de l'impact d'une méthode de sélection de clients sur les garanties de convergence de l'apprentissage fédéré. Nous introduisons ensuite « clustered sampling », une nouvelle méthode de sélection de clients généralisant et surpassant les méthodes de l'état de l'art en améliorant la représentativité des clients et en réduisant leur variabilité de sélection. Nous fournissons une justification théorique de clustered sampling et montrons une convergence plus rapide et plus stable par rapport aux approches standard. Nous étendons davantage les coefficients stochastique d'agrégation de clustered sampling pour prendre en compte des contributions asynchrones de clients et fournissons l'expression des poids d'agrégation pour une optimisation fédérée juste des méthodes d'apprentissage standard, telles que l'apprentissage fédéré synchrone et asynchrone, FedFix ou FedBuff. Dans la deuxième partie de la thèse, nous étudions la fiabilité de l'apprentissage fédéré dans des applications concrètes. Nous introduisons IFU, un nouveau schéma de désapprentissage fédéré, permettant de supprimer (désapprendre) la contribution d'un client à un modèle fédéré, avec des garanties statistiques sur l'efficacité du désapprentissage. Enfin, nous proposons deux stratégies pour les attaques de « free-riding » et introduisons un nouveau cadre théorique pour prouver leur efficacité. Dans l'ensemble, les travaux présentés dans cette thèse mettent en évidence de nouvelles propriétés théoriques de l'apprentissage fédéré, qui permettent d'approfondir notre compréhension de la robustesse et de la fiabilité du processus d'optimisation fédérée dans des scénarios d'applications concrètes.Federated Learning has gained popularity in the last years as it enables different clients to jointly learn a global model without sharing their respective data. FL specializes the classical problem of distributed learning, to account for the private nature of clients information (i.e. data and surrogate features), and for the potential data and hardware heterogeneity across clients, which is generally unknown to the server. Within this context, the main objective of this thesis is to present new theoretical and practical results to quantify the impact of the clients' data heterogeneity on the convergence guarantees of federated learning, while investigating the feasibility of critical components for deployment of federated learning in real-world applications.In the first part of the thesis we study the robustness and variability of federated learning to heterogeneous conditions. To this end, we introduce the notion of stochastic aggregation weights to generalize the aggregation scheme proposed in FedAvg, along with a novel theory to account asymptotically for the impact of a client sampling scheme on the federated learning convergence guarantees. We then introduce ``clustered sampling'', a novel client selection scheme generalizing and outperforming the state-of-the-art sampling methods in terms of improved representativity and lower variability. We provide a theoretical justification of clustered sampling, and show faster and smoother convergence as compared to the standard approaches. We further extend the stochastic aggregation scheme of clustered sampling to account for asynchronous client updates and provide the close-form solution of the aggregation weights for unbiased federated optimization of federated learning procedures, such as synchronous and asynchronous federated learning, FedFix, or FedBuff. In the second part of the thesis, we investigate the reliability of federated learning in practical applications. We introduce informed federated unlearning (IFU), a novel federated unlearning scheme, allowing to remove (unlearn) the contribution of a client from a federated model, with statistical guarantees on the unlearning effectiveness. Finally, we propose two strategies for free-riding attacks and introduce a novel theoretical framework to prove their efficiency. Overall, the work presented in this thesis highlights novel theoretical properties of federated learning, which ultimately allow to deepen our understanding on the robustness and reliability of the federated optimization process in practical application scenarios

    Fiabilité et robustesse de l'apprentissage fédéré pour applications concrètes

    No full text
    Federated Learning has gained popularity in the last years as it enables different clients to jointly learn a global model without sharing their respective data. FL specializes the classical problem of distributed learning, to account for the private nature of clients information (i.e. data and surrogate features), and for the potential data and hardware heterogeneity across clients, which is generally unknown to the server. Within this context, the main objective of this thesis is to present new theoretical and practical results to quantify the impact of the clients' data heterogeneity on the convergence guarantees of federated learning, while investigating the feasibility of critical components for deployment of federated learning in real-world applications.In the first part of the thesis we study the robustness and variability of federated learning to heterogeneous conditions. To this end, we introduce the notion of stochastic aggregation weights to generalize the aggregation scheme proposed in FedAvg, along with a novel theory to account asymptotically for the impact of a client sampling scheme on the federated learning convergence guarantees. We then introduce ``clustered sampling'', a novel client selection scheme generalizing and outperforming the state-of-the-art sampling methods in terms of improved representativity and lower variability. We provide a theoretical justification of clustered sampling, and show faster and smoother convergence as compared to the standard approaches. We further extend the stochastic aggregation scheme of clustered sampling to account for asynchronous client updates and provide the close-form solution of the aggregation weights for unbiased federated optimization of federated learning procedures, such as synchronous and asynchronous federated learning, FedFix, or FedBuff. In the second part of the thesis, we investigate the reliability of federated learning in practical applications. We introduce informed federated unlearning (IFU), a novel federated unlearning scheme, allowing to remove (unlearn) the contribution of a client from a federated model, with statistical guarantees on the unlearning effectiveness. Finally, we propose two strategies for free-riding attacks and introduce a novel theoretical framework to prove their efficiency. Overall, the work presented in this thesis highlights novel theoretical properties of federated learning, which ultimately allow to deepen our understanding on the robustness and reliability of the federated optimization process in practical application scenarios.L'apprentissage fédéré a gagné en popularité ces dernières années car il permet à différents clients d'apprendre conjointement un modèle global sans partager leurs données respectives. FL se spécialise dans le problème classique de l'apprentissage distribué, pour tenir compte de la nature privée des informations des clients et de l'hétérogénéité potentielle des données et du matériel entre les clients, qui est généralement inconnue du serveur. Dans ce contexte, l'objectif principal de cette thèse est de présenter de nouveaux résultats théoriques et pratiques pour quantifier l'impact de l'hétérogénéité des données clients sur les garanties de convergence de l'apprentissage fédéré, tout en étudiant la faisabilité de composants critiques pour le déploiement de l'apprentissage fédéré dans des applications concrètes. Dans la première partie de la thèse, nous étudions la robustesse et la variabilité de l'apprentissage fédéré aux données hétérogènes. À cette fin, nous introduisons la notion de coefficients stochastiques d'agrégation pour généraliser le schéma d'agrégation proposé dans FedAvg, ainsi qu'une nouvelle théorie pour tenir compte asymptotiquement de l'impact d'une méthode de sélection de clients sur les garanties de convergence de l'apprentissage fédéré. Nous introduisons ensuite « clustered sampling », une nouvelle méthode de sélection de clients généralisant et surpassant les méthodes de l'état de l'art en améliorant la représentativité des clients et en réduisant leur variabilité de sélection. Nous fournissons une justification théorique de clustered sampling et montrons une convergence plus rapide et plus stable par rapport aux approches standard. Nous étendons davantage les coefficients stochastique d'agrégation de clustered sampling pour prendre en compte des contributions asynchrones de clients et fournissons l'expression des poids d'agrégation pour une optimisation fédérée juste des méthodes d'apprentissage standard, telles que l'apprentissage fédéré synchrone et asynchrone, FedFix ou FedBuff. Dans la deuxième partie de la thèse, nous étudions la fiabilité de l'apprentissage fédéré dans des applications concrètes. Nous introduisons IFU, un nouveau schéma de désapprentissage fédéré, permettant de supprimer (désapprendre) la contribution d'un client à un modèle fédéré, avec des garanties statistiques sur l'efficacité du désapprentissage. Enfin, nous proposons deux stratégies pour les attaques de « free-riding » et introduisons un nouveau cadre théorique pour prouver leur efficacité. Dans l'ensemble, les travaux présentés dans cette thèse mettent en évidence de nouvelles propriétés théoriques de l'apprentissage fédéré, qui permettent d'approfondir notre compréhension de la robustesse et de la fiabilité du processus d'optimisation fédérée dans des scénarios d'applications concrètes

    Fiabilité et robustesse de l'apprentissage fédéré pour applications concrètes

    No full text
    Federated Learning has gained popularity in the last years as it enables different clients to jointly learn a global model without sharing their respective data. FL specializes the classical problem of distributed learning, to account for the private nature of clients information (i.e. data and surrogate features), and for the potential data and hardware heterogeneity across clients, which is generally unknown to the server. Within this context, the main objective of this thesis is to present new theoretical and practical results to quantify the impact of the clients' data heterogeneity on the convergence guarantees of federated learning, while investigating the feasibility of critical components for deployment of federated learning in real-world applications.In the first part of the thesis we study the robustness and variability of federated learning to heterogeneous conditions. To this end, we introduce the notion of stochastic aggregation weights to generalize the aggregation scheme proposed in FedAvg, along with a novel theory to account asymptotically for the impact of a client sampling scheme on the federated learning convergence guarantees. We then introduce ``clustered sampling'', a novel client selection scheme generalizing and outperforming the state-of-the-art sampling methods in terms of improved representativity and lower variability. We provide a theoretical justification of clustered sampling, and show faster and smoother convergence as compared to the standard approaches. We further extend the stochastic aggregation scheme of clustered sampling to account for asynchronous client updates and provide the close-form solution of the aggregation weights for unbiased federated optimization of federated learning procedures, such as synchronous and asynchronous federated learning, FedFix, or FedBuff. In the second part of the thesis, we investigate the reliability of federated learning in practical applications. We introduce informed federated unlearning (IFU), a novel federated unlearning scheme, allowing to remove (unlearn) the contribution of a client from a federated model, with statistical guarantees on the unlearning effectiveness. Finally, we propose two strategies for free-riding attacks and introduce a novel theoretical framework to prove their efficiency. Overall, the work presented in this thesis highlights novel theoretical properties of federated learning, which ultimately allow to deepen our understanding on the robustness and reliability of the federated optimization process in practical application scenarios.L'apprentissage fédéré a gagné en popularité ces dernières années car il permet à différents clients d'apprendre conjointement un modèle global sans partager leurs données respectives. FL se spécialise dans le problème classique de l'apprentissage distribué, pour tenir compte de la nature privée des informations des clients et de l'hétérogénéité potentielle des données et du matériel entre les clients, qui est généralement inconnue du serveur. Dans ce contexte, l'objectif principal de cette thèse est de présenter de nouveaux résultats théoriques et pratiques pour quantifier l'impact de l'hétérogénéité des données clients sur les garanties de convergence de l'apprentissage fédéré, tout en étudiant la faisabilité de composants critiques pour le déploiement de l'apprentissage fédéré dans des applications concrètes. Dans la première partie de la thèse, nous étudions la robustesse et la variabilité de l'apprentissage fédéré aux données hétérogènes. À cette fin, nous introduisons la notion de coefficients stochastiques d'agrégation pour généraliser le schéma d'agrégation proposé dans FedAvg, ainsi qu'une nouvelle théorie pour tenir compte asymptotiquement de l'impact d'une méthode de sélection de clients sur les garanties de convergence de l'apprentissage fédéré. Nous introduisons ensuite « clustered sampling », une nouvelle méthode de sélection de clients généralisant et surpassant les méthodes de l'état de l'art en améliorant la représentativité des clients et en réduisant leur variabilité de sélection. Nous fournissons une justification théorique de clustered sampling et montrons une convergence plus rapide et plus stable par rapport aux approches standard. Nous étendons davantage les coefficients stochastique d'agrégation de clustered sampling pour prendre en compte des contributions asynchrones de clients et fournissons l'expression des poids d'agrégation pour une optimisation fédérée juste des méthodes d'apprentissage standard, telles que l'apprentissage fédéré synchrone et asynchrone, FedFix ou FedBuff. Dans la deuxième partie de la thèse, nous étudions la fiabilité de l'apprentissage fédéré dans des applications concrètes. Nous introduisons IFU, un nouveau schéma de désapprentissage fédéré, permettant de supprimer (désapprendre) la contribution d'un client à un modèle fédéré, avec des garanties statistiques sur l'efficacité du désapprentissage. Enfin, nous proposons deux stratégies pour les attaques de « free-riding » et introduisons un nouveau cadre théorique pour prouver leur efficacité. Dans l'ensemble, les travaux présentés dans cette thèse mettent en évidence de nouvelles propriétés théoriques de l'apprentissage fédéré, qui permettent d'approfondir notre compréhension de la robustesse et de la fiabilité du processus d'optimisation fédérée dans des scénarios d'applications concrètes

    Clustered Sampling: Low-Variance and Improved Representativity for Clients Selection in Federated Learning

    No full text
    International audienceThis work addresses the problem of optimizing communications between server and clients in federated learning (FL). Current sampling approaches in FL are either biased, or non optimal in terms of server-clients communications and training stability. To overcome this issue, we introduce clustered sampling for clients selection. We prove that clustered sampling leads to better clients representatitivity and to reduced variance of the clients stochastic aggregation weights in FL. Compatibly with our theory, we provide two different clustering approaches enabling clients aggregation based on 1) sample size, and 2) models similarity. Through a series of experiments in non-iid and unbalanced scenarios, we demonstrate that model aggregation through clustered sampling consistently leads to better training convergence and variability when compared to standard sampling approaches. Our approach does not require any additional operation on the clients side, and can be seamlessly integrated in standard FL implementations. Finally, clustered sampling is compatible with existing methods and technologies for privacy enhancement, and for communication reduction through model compression

    A General Theory for Federated Optimization with Asynchronous and Heterogeneous Clients Updates

    No full text
    We propose a novel framework to study asynchronous federated learning optimization with delays in gradient updates. Our theoretical framework extends the standard FedAvg aggregation scheme by introducing stochastic aggregation weights to represent the variability of the clients update time, due for example to heterogeneous hardware capabilities. Our formalism applies to the general federated setting where clients have heterogeneous datasets and perform at least one step of stochastic gradient descent (SGD). We demonstrate convergence for such a scheme and provide sufficient conditions for the related minimum to be the optimum of the federated problem. We show that our general framework applies to existing optimization schemes including centralized learning, FedAvg, asynchronous FedAvg, and FedBuff. The theory here provided allows drawing meaningful guidelines for designing a federated learning experiment in heterogeneous conditions. In particular, we develop in this work FedFix, a novel extension of FedAvg enabling efficient asynchronous federated training while preserving the convergence stability of synchronous aggregation. We empirically demonstrate our theory on a series of experiments showing that asynchronous FedAvg leads to fast convergence at the expense of stability, and we finally demonstrate the improvements of FedFix over synchronous and asynchronous FedAvg

    A General Theory for Client Sampling in Federated Learning

    No full text
    International audienceWhile client sampling is a central operation of current state-of-the-art federated learning (FL) approaches, the impact of this procedure on the convergence and speed of FL remains under-investigated. In this work, we provide a general theoretical framework to quantify the impact of a client sampling scheme and of the clients heterogeneity on the federated optimization. First, we provide a unified theoretical ground for previously reported sampling schemes experimental results on the relationship between FL convergence and the variance of the aggregation weights. Second, we prove for the first time that the quality of FL convergence is also impacted by the resulting covariance between aggregation weights. Our theory is general, and is here applied to Multinomial Distribution (MD) and Uniform sampling, two default unbiased client sampling schemes of FL, and demonstrated through a series of experiments in non-iid and unbalanced scenarios. Our results suggest that MD sampling should be used as default sampling scheme, due to the resilience to the changes in data ratio during the learning process, while Uniform sampling is superior only in the special case when clients have the same amount of data

    Sequential Informed Federated Unlearning: Efficient and Provable Client Unlearning in Federated Optimization

    No full text
    The aim of Machine Unlearning (MU) is to provide theoretical guarantees on the removal of the contribution of a given data point from a training procedure. Federated Unlearning (FU) consists in extending MU to unlearn a given client's contribution from a federated training routine. Current FU approaches are generally not scalable, and do not come with sound theoretical quantification of the effectiveness of unlearning. In this work we present Informed Federated Unlearning (IFU), a novel efficient and quantifiable FU approach. Upon unlearning request from a given client, IFU identifies the optimal FL iteration from which FL has to be reinitialized, with unlearning guarantees obtained through a randomized perturbation mechanism. The theory of IFU is also extended to account for sequential unlearning requests. Experimental results on different tasks and dataset show that IFU leads to more efficient unlearning procedures as compared to basic retraining and state-of-the-art FU approaches
    corecore